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Abstract — Link and node failures are common two fundamen- 
tal problems that affect operational networks. Hence, protec- 
tion of communication networks is essential to increase their 
reliability, performance, and operations. Much research work 
has been done to protect against link and node failures, and 
to provide reliable solutions based on pre-defined provision or 
dynamic restoration of the domain. In this paper we develop 
network protection strategies against multiple link failures using 
network coding and joint capacities. In these strategies, the 
source nodes apply network coding for their transmitted data 
to provide backup copies for recovery at the receivers' nodes. 
Such techniques can be applied to optical, IP, and mesh net- 
works. The encoding operations of protection codes are defined 
over finite fields. Furthermore, the normalized capacity of the 
communication network is given by (n — t)/n in case of t link 
failures. In addition, a bound on the minimum required field size 
is derived. 



I. Introduction 

With the increase in the capacity of backbone networks, 
the failure of a single link or node can result in the loss 
of enormous amounts of information, which may lead to 
catastrophes, or at least loss of revenue. Network connections 
are therefore provisioned with the property that they can 
survive such failures, and hence several techniques have been 
introduced in the literature. Such techniques either add extra 
resources, or reserve some of the available network resources 
as backup circuits, just for the sake of recovery from failures. 
Recovery from failures is also required to be agile in order 
to minimize the network outage time. This recovery usually 
involves two steps: fault diagnosis and location, and rerouting 
connections. Hence, the optimal network survivability problem 
is a multi-objective problem in terms of resource efficiency, 
operation cost, and agility [9]. 

In network survivability, the four different types of failures 
that might affect network operations are [7], [10]: 1) link 
failure, 2) node failure, 3) shared risk link group (SRLG) 
failure, and 4) network control system failure. Henceforth, 
one needs to design network protection strategies against these 
types of failures. Although the common frequent failures are 
link failures, node failures sometimes happen due to burned 
swritch/router, fire, or any other hardware damage. In addition, 
the failure might be due to network maintenance. 

Network coding allows the intermediate nodes not only 
to forward packets using network scheduling algorithms, but 
also encode/decode them using algebraic primitive operations, 



see [1], [3], [4], [8] and the references therein. As an ap- 
plication of network coding, data loss because of failures in 
communication links can be detected and recovered if the 
sources are allowed to perform network coding operations. 

Recently, network protection strategies against multiple link 
failures using network coding and reduced capacities are 
proposed in [2], [5]. In this paper, we provide a new technique 
for protecting network failures using protection codes and 
reduced capacity in which the encoding operations are defined 
over finite fields. This technique can be deployed at an overlay 
layer in optical mesh networks, in which detecting failure is 
an essential task. The benefits of this approach are that: 

i) It allows receivers to recover the lost data without con- 
tacting a third parity or main domain server. 

ii) It has less computational complexity and does not require 
adding extra paths. 

iii) All 77 disjoint paths have full capacity except at t paths 
in case of protecting against t link failures. 

This paper is organized as follows. In Sections [TT] and [Til] 
we present the network model and problem definition. In 
Section [IV] we provide network protections against t link 
failures. We present differentiated distributed capacities in 
Section I VII and demonstrate analysis of protection codes in 
Section [VLT1 Finally, Bounds on the finite field size is proved 
in Section [V] and the paper is concluded in Section IVIIII 

II. Network Model and Assumptions 

In this section we introduce the network model and provide 
the needed assumptions. The main hypothesis of this network 
model can be stated as follows. 

i) Let Af be a network represented by an abstract graph 
G = (V, E), where V is the set of nodes and E be set 
of undirected edges. Let S and R are sets of independent 
sources and destinations, respectively. The set V = V U 
SUR contains the relay nodes, sources, and destinations. 
Assume for simplicity that \S\ = \R\ = n, hence the set 
of sources is equal to the set of receivers. 

ii) The node can be a router, switch, or an end terminal 
depending on the network model Af and the transmission 
layer. 

iii) L is a set of links L = {L-y, L2, . . . , L n } carrying the data 
from the sources to the receivers as shown in Fig. Q] All 
connections have the same bandwidth, otherwise a con- 
nection with high bandwidth can be divided into multiple 
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Fig. 1. Network protection against a single path failure using reduced 
capacity and network coding. One path out of n primary paths carries encoded 
data. The black points represent various other relay nodes 



connections, each of which has a unit capacity. There are 
exactly n connections. For simplicity, we assume that the 
number of sources is less than or equal to the number 
of links. A sender with a high capacity can divide its 
capacity into multiple unit capacity, each of which has its 
own link. Put differently, 

{(si,wu), (w 1 i,w 2l ), (w(\)i,ri)}, (1) 

where 1 < i < n and (w(j-i)i, Wji) £ E, for some 
integer A > 1. Hence we have \S\ = \R\ = \L\ = n. The 
n connection paths are pairwise link disjoint. 

iv) The data from all sources are sent in cycles. Each cycle 
has a number of time slots n. Hence tj is a value at round 
time slot j in cycle S. 

v) The failure on a link Li may happen due to the network 
circumstance such as a link replacement, overhead, etc. 
We assume that the receiver is able to detect a failure and 
our protection strategy is able to recover it. 

vi) In this model Af, consider only a single link failure, it is 
sufficient to apply the encoding and decoding operation 
over a finite field with two elements, we denote it F2 = 
{0,1}. 

III. Problem Setup and Terminology 

We assume that there is a set of n connections that need to 
be protected with %100 guaranteed against single and multiple 
link failures. We assume that all connections have the same 
bandwidth, and each link (one hop or circuit) has the same 
bandwidth as a path. 

Every sender Sj prepares a packet packet Si _> ri to send to 
a receiver r 2 . The packet contains the sender's ID, data x\, 
and a round time for every cycle ir s for some integers 5 and 
I. There are two types of packets: 

i) Plain Packets: Packets sent without coding, in which the 
sender does not need to perform any coding operations. 
For example, in case of packets sent without coding, the 
sender Sj sends the following packet to the receiver r^. 

packet St ^ ri := (LD Si ,x*,tg) (2) 

ii) Encoding Packets: Packets sent with encoded data, in 
which the sender Sj sends other sender's data. In this 



case, the sender Sj sends the following packet to receiver 

n 

packet Sj -, rj ■= {ID Sj , ^ o^f,^), ( 3 ) 

where «i £ F 9 . 
In either case the sender has a full capacity in the connection 
link Li. 

Definition 1: The capacity of a connecting link L L between 
Si and r, is defined by 

{1, Li has active signals; 
0, otherwise. 

And the total capacity is given by the summation of all link 
capacities. What we mean by an active link is that the receiver 
is able to receiver un-encoded signals/messages throughout 
this link and process them. 

Clearly, if all links are active then the total capacity is n and 
normalized capacity is 1, In general the normalized capacity 
of the network for the active and failed links is computed by 

1 " 

CV = -T>. (5) 

i=l 

The following definition describes the working and protec- 
tion paths between two network switches as shown in Fig. Q] 

Definition 2: The working paths on a network with n 
connection paths carry un-encoded traffic under normal oper- 
ations. The Protection paths provide an alternate backup path 
to carry encoded traffic. A protection scheme ensures that data 
sent from the sources will reach the receivers in case of failure 
incidences on the working paths. 

IV. NPS-T: Protecting Against t Path Failures 

In this section we present a network protection strategy 
against t failures in optical networks. Assume the same 
notations as shown in the previous sections hold. Assume also 
that the total number of failures are t and they happen at 
arbitrary t links. 

Let m = \n/t], hence we have m rounds per cycle. The 
encoding operations of NPS-T against t failures are shown in 
Scheme (0. We can see that yg in general is given by 

(j-i)t n 

Et i— 1 1 \ ~* 1 j 

i=l i=jt+l 

for (j - l)f + 1 < i < jt, l<j<n. (7) 

The advantages of NPS-T approach is that 

• The data is encoded and decoded online, and it will be 
sent and received in different rounds. Once the receivers 
detect failures, they are able to obtain a copy of the 
lost data immediately without delay by querying the 
neighboring nodes with unbroken working paths. 

• The recovery is assured with %100. Since t paths will 
carry encoded data, up to t failures can be recovered. 
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Fig. 2. The encoding Scheme of t link failures, m = [n/t], 1 < j < m and 1 < I < t. t out of the n connections carry encoded data. The coefficients 
are chosen over F q , for q > n — t + 1. 



• Using this strategy, no extra paths are needed. This will 
make this approach more suitable for applications, in 
which adding extra paths is not allowed. 

• Since in real case scenarios, the number of failures is very 
small in comparison to the number of working paths, the 
NPS-T performs well. 

• The encoding operations are linear, and the coefficients 
of the variables x\ are taken from a finite field with q > 
n — t + 1 elements. 

Theorem 3: Let n be the total number of connections from 
sources to receivers. The capacity of NPS-T strategy shown 
in Scheme [6] against t path failures is given by 



= (n - t)/(n) 



(8) 



Lemma 4: The encoding Scheme © is optimal in terms of 
max capacity. 

One can not find a better encoding scheme against t link 
failures rather than providing one protection path against one 
failure. Indeed t protection paths are used to protect t link 
failures and this is shown in Scheme ©. 

A. Encoding Operations 

Assume that each connection path Li (L) has a unit capacity 
from a source Si (S) to a receiver r,; (R). The data sent from the 
sources S to the receivers R is transmitted in rounds. Under 
NPS-T, in every round n — t paths are used to carry new 
data (or?), and t paths are used to carry protected data units, 
there are t protection paths. Therefore, to treat all connections 
fairly, there will be n/t rounds in a cycle, and in each round 
the capacity is given by n-t. 

We consider the case in which all symbols x\ belong to 
the same round. The first t sources transmit the first encoded 



data units J/1,2/2, • • • ,Vt> an d m the second round, the next t 
sources transmit yt+i, Ut+2, ■ ■ ■ , yiu and so on. All sources S 
and receivers R must keep track of the round numbers. Let 
ID Si and x Si be the ID and data initiated by the source s^. 
Assume the round time j in cycle 5 is given by tg. Then 
the source Sj will send packet Si on the working path which 
includes 



Packet* 



(ID a 



(9) 



Also, the source Sj, that transmits on a protection path, will 
send a packet packet Sj : 

Packet Sj =(ID Sj , yj ,4), (10) 

where yk is defined in @. Hence the protection paths are used 
to protect the data transmitted in round £, which are included 
in the x\ data units. So, we have a system of t independent 
equations at each round time that will be used to recover at 
most t unknown variables. 

The strategy NPS-T is a generalization of protecting against 
a single path failure shown in the previous section in which 
t protection paths are used instead of one protection path in 
case of one failure. We also notice that most of the network 
operations suffer from one and two path failures [7], [10]. 



B. Proper Coefficients Selection 

One way to select the coefficients a^'s in each round such 
that we have a system of t linearly independent equations is 
by using the matrix H shown in ( fTTT i. Let q be the order of a 
finite field, and a be the root of unity. Then we can use this 
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matrix to define the coefficients of the senders as: 



H = 



1 

a 

™2 



1 



v 2(t-l) 



1 

2(n- 



Q 



(t_l)(„_l) 



(ID 



We have the following assumptions about the encoding oper- 
ations. 

1) Clearly if we have one failure t = 1, then all coefficients 
will be one. The first sender will always choose the unit 
value. 

2) If we assume t failures, then the y\, y2, ■ ■ ■ , Vt equations 
are written as: 



Vi = 



E -I 



V2 



E 

i=t+l 



a^xl (12) 



a »(i-l) mod (l-^Xi, 



(13) 



This equation gives the general theme to choose the co- 
efficients at any particular round in any cycle. However, the 
encoded data y^s are defined as shown in Equation H3\ . In 
other words, for the first round in cycle one, the coefficients 
of the plain data x±, X2, ■ ■ ■ , Xt are set to zero. 

C. Decoding Operations 

We know that the coefficients a{ , «4 , . . . , a„ are elements 
of a finite field, hence the inverses of these elements exist and 
they are unique. Once a node fails which causes t data units to 
be lost, and once the receivers receive t linearly independent 
equations, they can linearly solve these equations to obtain the 
unknown t data units. At one particular cycle j, we have three 
cases for the failures 

i) All t link failures happened in the working paths, i.e. the 
working paths have failed to convey the messages x\ in 
round I. In this case, n — t equations will be received, t 
of which are linear combinations of n — t data units, and 
the remaining n — 2t are explicit Xi data units, for a total 
of n — t equations in n — t data units. In this case any t 
equations (packets) of the t encoded packets can be used 
to recover the lost data. 

ii) All t link failures happened in the protection paths. In this 
case, the exact remaining n-t packets are working paths 
and they do not experience any failures. Therefore, no 
recovery operations are needed. 

iii) The third case is that the failure might happen in some 
working and protection paths simultaneously in one par- 
ticular round in a cycle. The recover can be done using 
any t protection paths as shown in case i. 

V. Bounds on the Finite Field Size, F q 

In this section we derive lower and upper bound on the al- 
phabet size required for the encoding and decoding operations. 
In the proposed schemes we assume that direction connections 
exist between the senders and receivers, which the information 
can be exchanged with neglected cost. 



The first result shows that the alphabet size required must be 
greater than the number of connections that carry unencoded 
data. 

Theorem 5: Let n be the number of connections in the 
network model Af, then the receivers are able to decode the 
encoded messages over F g and will recover from t > 2 path 
failures if 



q>n-t+l. 



(14) 



Also, if q = p r , then r < [log p (n + 1)]. The binary field is 
sufficient in case of a single path failure. 

Proof: We will prove the lower bound by construction. 
Assume a NPS-T at one particular time t e s in the round £ in a 
certain cycle 8. The protection code of NPS-T against t path 
failures is given infill 

Without loss of generality, the interpretation of Scheme (fPTT i 
is as follows: 

i) The columns correspond to the senders S and rows 
correspond to t encoded data y±,y2, ■ . ■ ,yt- 

ii) The first row corresponds to yi if we assume the first 
round in cycle one. Furthermore, every row represents 
the coefficients of every senders at a particular round. 

iii) The column i represents the coefficients of the sender s; 
through all protection paths L\, L2, ■ ■ ■ , L t . 

iv) Any element a 1 G F q appears once in a column and 
row, except in the follow column and first row, where all 
elements are one's. 

v) All columns (rows) are linearly independent. 

Due to the fact that the t failures might occur at any t 
working paths of L = {l-y, L2, ■ ■ ■ , l n }, then we can not predict 
the t protection paths as well. This means that t out of the 
n columns do not participate in the encoding coefficients, 
because t paths will carry encoded data. We notice that 
removing any t out of the n columns in Scheme ( fTTT i will result 
to n — t linearly independent columns. Therefore the smallest 
finite field that satisfies this condition must have n — t + 1 
elements. 

The upper bound comes from the case of no failures, hence 
q > {n +1). Assume q is a prime power , then the result 
follows. ■ 

if q = 2 r , then in general the previous bound can be stated 

as 



t + 1 < q < 2 l" lo «2 (»+!)! . 



(15) 



The following result shows the maximum admissible paths, 
which can suffer from failures, and the decoding operations 
can be achieved successfully. 

Lemma 6: Let n and t be the number of connections and 
failures in the network model Af, then we have t < [n/2\. 

Proof: The proof is a direct consequence and from the 
fact that the protection paths must be less than or equal to the 
number of working paths. ■ 
This lemma shows that one can not provide protection paths 
better than duplicating the number of working paths. 

VI. Network Protection Using Distributed 
Capacities and Network Coding 

In this section we develop network protection strategy where 
some connection paths have high priorities (less bandwidth, 
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(17) 



high demand). Let 71 be the set of available connections 
(disjoint paths from sources to receivers). Let 711 be the set 
of rounds in every cycle. In the previous strategies (NPS-T) 
we assumed that all connection paths have the same priority 
demand and working capacities. This might be the real case 
scenario, connections that carry applications with multimedia 
traffic have high priority than applications that carry data 
traffic. Therefore, it is required to design network protection 
strategies based on the traffic and sender priorities. 

Consider that available working connections 71 may use their 
bandwidth assignments in asymmetric ways. Some connec- 
tions are less demanding in terms of bandwidth requirements 
than other connections that require full capacity frequently. 
Therefore connections with less demanding can transmit more 
protection packets, while other connections demand more 
bandwidth, and can therefore transmit fewer protection packets 
throughout transmission rounds. Let m be the number of 
rounds and t\ be the time of transmission in a cycle 5 at round 
i. For a particular cycle i, let t be the number of protection 
paths against t failures that might affect the working paths. 
We will design network protection strategy against t arbitrary 
link failures (NPS-T2) as follows. Let the source Sj sends di 
data packets and p. L protection packets such that dj + Pj = m. 
Put differently: 



y^X d i +Pi) = nm 



(16) 



In general we do not assume that di = dj and pi = pj. NPS- 
T2 is described as shown in Scheme [T7] 
The encoded data yf is given by 



t \ ^ 1 

Vi = 2^ X k 



(18) 



We assume that the maximum number of failures that might 
occur in a particular cycle is t. Hence the number of protection 
paths (paths that carry encoded data) is t. The selection of the 
working and protection paths in every round is done using 
a priority demanding function at the senders's side. It will 
also depend on the traffic type and service provided on these 
protection and working connections. 



In Scheme ( fTTT i every connection i is used to carry di un- 
encoded data x\ , xf , . . . , xf* (working paths) and p, encoded 
data yl,yf, yf z (protection paths) such that di + pi = m. 

Lemma 7: Let t be the number of connection paths car- 
rying encoded data in every round in NPS-T2, then the 
normalized network capacity CV is given by 



(71 — t)/n 



(19) 



Proof: The proof is straight forward from the fact that 
t protection paths exist in every round, hence n — t working 
paths are available throughout all m rounds. ■ 

VII. Analysis of the Protection Codes Over F q 

We will prove correctness of the protection codes over F q . 
Let F q be a finite field with q elements such that q = p r for 
some nonzero integer r and prime p. We will drive a scheme 
to recover from any 711 failures in the 71 + 771 primary and 
protection paths. Assume t be the number of failures in the 
primary paths. We have three cases 

i) All failures occur in the primary paths, t = m. In this case 
we need to establish a system of t linearly independent 
equations in t variables. 

ii) t failures occur in the primary paths and in — t failures 
occur in the protection paths. In this case we need to 
establish a system of equations to recover the failures in 
the primary paths only. 

iii) All failures occur in the protection paths. No recovery 
process is needed in this case. 

We will show the encoding operation in case of directional 
connections from the senders to receivers, consider the worst 
case scenario in which 771 = t. We can describe the encoding 
scheme for multiple link failures as shown in (fTTb . 

All a's powers are taken module the field size, i.e. 
a ij mod g=n+i j n otner wor( j Si if q > 71 + 1, then we have 
the encoding matrix 
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1 
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n 2 
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t-l n 2(t~l) n 3(t-l) 
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-v2(n) 



„(*"!)(«) 



(20) 



In this case we have a q ~ l = 1, q is a prime power. 

The first column represents the coefficients of the encoding 
data at the first sender. Also, the first row represents the binary 
coefficients of all senders in case of a single link failure. Hence 
a 1-1 column represents the coefficients of the encoding data 
at the i sender for all 1 < i < n — 1. 

In general for multiple 771 = t failures, the encoding data in 
the j-th protection is given by 



Vn+ 3 



E 



1) mod q 



j. ; 



(21) 



for 1 < j < 711. 

As a matter of fact, the square sub-matrix of t columns of 
the encoding scheme [20] is invertable (has a full rank) if and 
only if its determinant is not equal to zero [6]. We will show 
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that for any t arbitrary link failures, the receivers are able to 
form a system of t linearly independent equations and recover 
the lost data. 



has a full rank. 

iii) We will add any arbitrary row and column to the matrix 
Bn-i to construct the matrix B. 



Lemma 8: If there are t link failures in the primary paths, 
then the receivers are successfully able to recover from those 
failures using t protection paths. 



Proof: Let 
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by the second element a J1 . Choosing any t arbitrary columns 
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(22) 



Hence we have a system of t equations in t variables. Clearly, 
all elements in each row are different. Indeed this system has 
determinant given by the form [6, Theorem 6.5.5] 



o 



h+jx+33+—+jt 



n( 

h>e 



3h 



7^0, 



(23) 



which proves the result. 



Now, we shall prove the general case that any fix fi square 
sub-matrix of the matrix ( TZOt has a full rank. Assume the 
square matrix is represented by 



B = 



a 



a 113 ' 



(24) 



where 1 < fi < n and a %i ' 3 i' G F 9 . 

Lemma 9: The sub-matrix B described in d24l i has a full 
rank. 

Proof: We proceed the proof by mathematical induction, 
i) We first prove that any 2x2 sub-matrix of B has a 
full rank. It means that for any four elements lie in the 
corner are not alike (do not share a common factor). Put 
differently, ij^j and £ ^ 1, 



a ■ 



(25) 



If we divide the second row by a 



(l-i 



Now assume by contradiction that c^ 1 l > l .o£ 3 
a (i-l)i = op--t)j 



we obtain a 1 . 
a 3 . Or 

mod q. Obviously, this contradicts the 
fact that i ^ 1 and i ^ j. In addition (I - l)(j - i) = 
mod q contradicts the fact about the field order. Hence, 
the result is a consequence, 
ii) Now, assume the matrix 



Bn-i — 



-132 



a i^U-i 

ytft—ljft—l 



(26) 



B 





a ii32 


a iiJ,.-i 


a ll3 » 


a 1231 


a 1232 


a i2j'n-l 


a l23 » 




a^-ih . 




a l n-i3 fi 




a^ 32 




a l »3ii 



(27) 



All elements in the last columns are different, also all 
elements in the last row are different. Since a li3 j is an 
element in F„, it has a unique inverse. Therefore, we can 
divide every row in the matrix B by the element in the 
last column. Hence, we have 



B' = 



a 1131 




a *lJ„-l 


1 ' 


a l 23l 


a 1 ' 23 ' 2 




1 








1 




■ / ■/ 

a V^2 


■ i I 


1 _ 



(28) 



All powers of a's are taken module q. Furthermore, all 
elements in each row (or column) are pairwise disjoint. The 
matrix B' is similar to the matrix shown in ( f22l . Using 
lemma [8] the matrix B' has a full rank given by fi. ■ 



VIII. Conclusion 

In this paper we demonstrated the encoding operations of 
network protection codes defined over finite fields. We derived 
a bound on the minimum field size required for choosing 
unique coefficients of data sent on the working paths. In 
addition we presented a scheme for differentiated services in 
cases of some working paths have high priorities in terms of 
bandwidth and capacity assignments. 
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